Skip to content

Conversation

@aldy505
Copy link
Collaborator

@aldy505 aldy505 commented Oct 4, 2025

DESCRIBE YOUR PR

Turns out most of our self-hosted users has never touch Kafka before, so it's a good idea to introduce them regarding how Kafka works.

Also added how to increase consumers replica if they're lagging behind.

IS YOUR CHANGE URGENT?

Help us prioritize incoming PRs by letting us know when the change needs to go live.

  • Urgent deadline (GA date, etc.):
  • Other deadline:
  • None: Not urgent, can wait up to 1 week+

SLA

  • Teamwork makes the dream work, so please add a reviewer to your PRs.
  • Please give the docs team up to 1 week to review your PR unless you've added an urgent due date to it.
    Thanks in advance for your help!

PRE-MERGE CHECKLIST

Make sure you've checked the following before merging your changes:

  • Checked Vercel preview for correctness, including links
  • PR was reviewed and approved by any necessary SMEs (subject matter experts)
  • PR was reviewed and approved by a member of the Sentry docs team

LEGAL BOILERPLATE

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

EXTRA RESOURCES

Turns out most of our self-hosted users has never touch Kafka before, so it's a good idea to introduce them regarding how Kafka works.

Also added how to increase consumers replica if they're lagging behind.
@vercel
Copy link

vercel bot commented Oct 4, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
develop-docs Ready Ready Preview Comment Nov 2, 2025 2:55am
1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
sentry-docs Ignored Ignored Preview Nov 2, 2025 2:55am

@aldy505 aldy505 requested review from BYK and hubertdeng123 October 4, 2025 03:31
aldy505 and others added 2 commits October 4, 2025 20:37
Co-authored-by: Kevin Pfeifer <[email protected]>
Co-authored-by: Kevin Pfeifer <[email protected]>
Copy link
Member

@BYK BYK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks waaaay better than the old version but I'm not qualified to give proper feedback. Still unblocking as I think it is miles better that whatever we have currently.

This section is aimed for those who have Kafka problems, but are not yet familiar with Kafka. At a high level, it is a message broker which stores message in a log (or in an easier language: very similar to an array) format. It receives messages from producers that aimed to a specific topic, and then sends them to consumers that are subscribed to that topic. The consumers can then process the messages.

This happens where Kafka and the consumers get out of sync. Possible reasons are:
On the inside, when a message enters a topic, it would be written to a certain partition. You can think partition as physical boxes that stores messages for a specific topic, each topic will have their own separate & dedicated partitions. In a distributed Kafka setup, each partition might be stored on a different machine/node, but if you only have a single Kafka instance, then all the partitions are stored on the same machine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph has a bit of overlap with the next one and does not add much to understanding kafka IMO, so I think we could remove it completely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind leaving this here. At least I want to emphasize the existence of "partition"

aldy505 and others added 2 commits October 7, 2025 17:17
```log
Exception: KafkaError{code=OFFSET_OUT_OF_RANGE,val=1,str="Broker: Offset out of range"}
```
This section is aimed for those who have Kafka problems, but are not yet familiar with Kafka. At a high level, Kafka is a message broker which stores messages in a log (or in an easier language: very similar to an array) format. It receives messages from producers that write to a specific topic, and then sends them to consumers that are subscribed to that topic. The consumers can then process the messages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to be pedantic, Kafka doesn't send messages to consumers. Consumers poll kafka and fetch new messages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too pedantic and not so beginner friendly 😅

1. Running out of disk space or memory
2. Having a sustained event spike that causes very long processing times, causing Kafka to drop messages as they go past the retention time
3. Date/time out of sync issues due to a restart or suspend/resume cycle
When a producer sends a message to a topic, it will either stick to a certain partition number (example: partition 1, partition 2, etc.) or it will randomly choose a partition. A consumer will then subscribe to a topic and will automatically be assigned to one or more partitions by Kafka. The consumer will then start receiving messages from the assigned partitions. One very important aspect to note is that **the number of consumers within a consumer group must not exceed the number of partition for a given topic**. If you have more consumers than number of partitions, then the consumers will be hanging with no messages to consume.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use semantic partitioning, we don't generally assign partition numbers to messages. Instead we use 'keyed messages', which define how messages are grouped (by key) but how those keys map to partitions is up to Kafka.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding on this, if messages don't have partition keys, kafka will assign the message to a partition via round-robin (so technically not randomly)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will update these.

3. Date/time out of sync issues due to a restart or suspend/resume cycle
When a producer sends a message to a topic, it will either stick to a certain partition number (example: partition 1, partition 2, etc.) or it will randomly choose a partition. A consumer will then subscribe to a topic and will automatically be assigned to one or more partitions by Kafka. The consumer will then start receiving messages from the assigned partitions. One very important aspect to note is that **the number of consumers within a consumer group must not exceed the number of partition for a given topic**. If you have more consumers than number of partitions, then the consumers will be hanging with no messages to consume.

Each messages in a topic will then have an "offset" (number), this would easily translates to "index" in an array. The offset will be used by the consumer to track where it is in the log, and what's the last message it has consumed. If the consumer is not able to keep up with the producer, it will start to lag behind. Most of the times, we want "lag" to be as low as possible, meaning we don't want to have so many unprocessed messages. The easy solution would be adding more partitions and increasing the number of consumers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to mention that offsets are scoped to a partition, and that each partition in a topic will have the same offset numbers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I should be mentioning that


This happens where Kafka and the consumers get out of sync. Possible reasons are:

1. Running out of disk space or memory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but I think if kafka runs out of disk space the service crashes (which wouldn't cause offset out of range on consumers)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking, you're right. But after a disk out of space incident, there will potentially be a massive offset out of range error.

Perhaps this should be made clearer.

1. Running out of disk space or memory
2. Having a sustained event spike that causes very long processing times, causing Kafka to drop messages as they go past the retention time
3. Date/time out of sync issues due to a restart or suspend/resume cycle
When a producer sends a message to a topic, it will either stick to a certain partition number (example: partition 1, partition 2, etc.) or it will randomly choose a partition. A consumer will then subscribe to a topic and will automatically be assigned to one or more partitions by Kafka. The consumer will then start receiving messages from the assigned partitions. One very important aspect to note is that **the number of consumers within a consumer group must not exceed the number of partition for a given topic**. If you have more consumers than number of partitions, then the consumers will be hanging with no messages to consume.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding on this, if messages don't have partition keys, kafka will assign the message to a partition via round-robin (so technically not randomly)

Copy link
Contributor

@sfanahata sfanahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some wording that could use cleaning up. I left a few suggestions. Overall, I love the additional information about partitions and consumers.

cursor[bot]

This comment was marked as outdated.

@aldy505 aldy505 enabled auto-merge (squash) November 2, 2025 02:51
@aldy505 aldy505 merged commit ffca9d0 into master Nov 2, 2025
12 checks passed
@aldy505 aldy505 deleted the aldy505/self-hosted/troubleshooting-kafka branch November 2, 2025 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants